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(N 

O Abstract 
<N , , 

^ A fc-dissimilarity on a finite set X, \X\ > k, is a, map from the set of size k 

^ subsets of X to the real numbers. Such maps naturally arise from cdgc-wcighted 

trees T with leaf-set X: Given a subset 1" of X of size k, D{Y) is defined to 
be the total length of the smallest subtree of T with leaf-set Y. In case k = 2, 
,— I it is well-known that 2-dissimilarities arising in this way can be characterized by 

the so-called "4-point condition". However, in case k > 2 Pachter and Speyer 
recently posed the following question: Given an arbitrary A:-dissimilarity, how do 
we test whether this map comes from a tree? In this paper, we provide an answer 
to this question, showing that for A; > 3 a /c-dissimilarity on a set X arises from 
a tree if and only if its restriction to every 2A;-element subset of X arises from 
some tree, and that 2k is the least possible subset size to ensure that this is 
the case. As a corollary, we show that there exists a polynomial-time algorithm 
^ to determine when a ^-dissimilarity arises from a tree. We also give a 6-point 

condition for determining when a 3-dissimilarity arises from a tree, that is similar 
to the aforementioned 4-point condition. 

m 

^ 1 Introduction 

(N 

^ In phylogenetics, as well as other areas making use of classification techniques, many 

> distance-based methods for constructing trees are based on the following fundamental 

^ observation. For X a non-empty finite set, a graph-theoretical tree T = {V, E) with 

^ leaf-set X C V and non-negative edge-weighting u : E ^ can be encoded in terms 

of the restriction of the pairwise dissimilarity (i(T,w) to X, where d(T^^-){u,v) denotes the 
length of the shortest path in T between u and v {u,v G V). In other words, the tree 
T can be completely recovered from the matrix of pairwise values {d(^T,uj))x,yex- Such 
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Figure 1: (a) A phylogenetic tree T = (y,E) on X = {xi,X2, ■ ■ ■ ,xq}. (b) A non- 
negative edge- weighting uj for the phylogenetic tree T in (a). The edges in the small- 
est subtree of T containing the set Y := {xi,X4,X5} are drawn bold and their to- 
tal weight is D^rp^-^(Y) = 11. (c) A weighted, rooted phylogenetic tree (T,p,co) on 
X = {xi,X2, . . . yX^} with an equidistant edge-weighing u. The edges whose weight 
contribute to D^rp J{xi, X2, x^}) = 3 are drawn bold. 



dissimilarities are commonly called "tree metrics" and there is an extensive literature 
concerning their properties (see e.g. Semple and Steel (2003) and Gordon (1987) for 



overviews) . 

Various methods have been proposed for constructing trees that exploit this obser- 
vation. These essentially work by projecting an arbitrary pairwise dissimilarity onto 



some "nearby" tree metric (see e.g. Felsenstein (2003); de Soete (1983)). Even so, it is 
well-known that such methods can suffer from the fact that pairwise distance estimates 



involve some loss of information (see e.g. page 176 in Felsenstein (2003)). As a potential 



solution to this problem, Pachter and Speyer (2004) proposed using k-wise distance es 



timates, /c > 3, to reconstruct trees, an approach which they subsequently implemented 
in Levy, Yoshida, and Pachter (2006) (see also Grishin (1999) where a related idea was 



investigated). Their rationale was that fc-wise estimates are potentially more accurate 
since they can capture more information than pairwise distances, a point that was also 



made in Chapter 12 of Felsenstein (2003). 



To describe Pachter and Speyer's approach, recall that a phylogenetic tree (on X ) is a 
graph-theoretical tree T = (V, E) in which every non-leaf vertex has degree at least three 
and whose leaf-set is X (cf. Figure [l](a)). In case a real-valued weight ^(e) is associated 
to every edge e of T, we call T a weighted phylogenetic tree, and we usually denote such 
a tree by {T,u). Now, for any /c-element subset Y C X , k > 2, let D^rp^-^{Y) denote 
the total edge- weight of the smallest subtree of T with leaf-set Y (cf. Figure [l|^b)). 
Note that this quantity is sometimes called the "phylogenetic diversity" of Y (see e.g. 

d(T,u){x,y) for all 



Faith 



(1992) and 



x,y ex: 



Steel 



(2005)) and that, for k = 2, 



In Pachter and Speyer (2004), the following result is proven: 



Theorem 1. Let {T,u) be a weighted phylogenetic tree on X with u non-negative and 
u{e) > for every edge e of T that is not incident to a leaf, \X\ = n, and k > 2 be 
some integer. If n > 2k — 1, then {T,u) is determined by the map D^tu) ('^''^d it is not 
if 2k -2 = n> 2). 

In other words, just as in the case k 



2, for /c > 3 it is possible to recover (T,u) 
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from the function DL, n 

(1 ,UJ) 



that maps the set of subsets of X of size k (denoted (^)) 
to M. Here we call any map D : — )■ M a k- dissimilarity. Note that 3-dissimilarities 



have been investigated, for example, in Hayashi (1972), Joly and Le Calve (1995) and 



Heiser and Bennani (1997), and arbitrary fc-dissimilarities in Deza and Rosenberg (2000) 



and Warrens (2010), under names such as k-way dissimilarities, k-way distances and k- 



semimetrics (see also Bandelt and Dress (1994) for related work). 



In this paper we shall provide a solution to the following problem raised in Pachter 



and Speyer (2004): 



'However, if we are simply given a /c-dissimilarity map D : 



we do 



not know how to test whether this map comes from a phylogenetic tree. 



Note that Dress and Steel (2007) study the related problem of characterizing when 



a map D from the set of subsets of X of size at most k into some Abelian group 
G can be represented by a phylogenetic tree on X whose edges are assigned elements 
from G. However, we consider subsets of X of size precisely k, leading to a quite different 
char act erizat ion . 

In order to state the main result of this paper, we first recall some more definitions 
concerning phylogenetic trees. A rooted phylogenetic tree (on X), is a tree T = {V,E) 
with (i) a distinguished vertex p, called the root of T, that has degree at least 2, (ii) leaf- 
set X and (iii) no vertex mV\{XU {p}) with degree less than 3. In case a real-valued 
weight u!{e) is associated to every edge e G we call T a weighted, rooted phylogenetic 
tree, and denote it by {T,p,uj). Note that, for such a tree, we define the maps (i(r,w) 



and I^fr,^) 



c)). In 



in the same way as for (unrooted) phylogenetic trees (cf. Figure 
addition, we call an edge- weighting w of T equidistant if (i) d(^T,uj){x, p) = d(T,uj)Xx' , p) for 
all x,x' G X, and (ii) (i(T,<^)(x, -u) < d(T,u)){x,v) for all x G X and any u,v eV that lie 
on the path from x to p in T which first meets u and then v (cf. Figure [l]^c)). Such 
weightings commonly arise when modeling sequence evolution assuming a molecular 
clock (see e.g. |Felsenstein| ( |2003| )). 

Now, we call a fc-dissimilarity D treelike if there exists a weighted phylogenetic tree 
{T,u) with u non-negative such that D = D^tlj) holds, and we call D equidistant if 
there exists a weighted, rooted phylogenetic tree (T, p, oj) on X with oj equidistant such 
that D = D^Tuj) holds. In this paper, we shall prove the following: 



Theorem 2. Let k > 2 and D be a k- dissimilarity map on a set X with \X\ > 2k. Then 
D is treelike/ equidistant if and only if the restriction of D to every 2k-element subset of 
X is treelike/ equidistant. Moreover, for all k >3 there exist k- dissimilarity maps whose 
restrictions to every {2k — l)-element subset of X are treelike /equidistant but that are 
not treelike/ equidistant. 



Note that for the case k = 2 this result is well-known (see e.g. Semple and Steel] 



p)03l Theorem 7.2.5 and Corollary 7.2.7)). 

After presenting some preliminaries in the next section, we prove Theorem [2] in 
Sections [3] and |4j As a corollary of Theorem [2| we also show that, for fixed k > 2, there 
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is an algorithm with run-time that is bounded by a polynomial in |X| to decide if an 
arbitrary /^-dissimilarity D is treelike (Corollary [T]). It would be interesting to know if 
such algorithms can be found that have good run-time bounds for k > 3, such as those 
that have been devised for A; = 2 (see e.g. Culberson and Rudnicki (1989), Bandelt| 



(1990)). More generally, it might also be of interest to use Theorem |2j to help devise 
new methods to construct trees from /^-dissimilarities such as the one described in |Levy| 



et al. (2006). 



Note that for /c = 2 the bound 2k = 4 given in the second sentence of Theorem [2] is 
sharp for treelike dissimilarities, but that it can be improved to 2/;; — 1 = 3 for equidistant 
dissimilarities (see e.g. Semple and Steel (2003, Theorem 7.2.5)). Although this is not 



the case for /c > 3, in Section [5] we shall prove that under certain circumstances it may 
still be possible to recover a tree from a /c-dissimilarity Z) on X in case it is equidistant 
on every {2k — l)-element subset of X (see Theorem [s]). 

We conclude the paper by considering 3-dissimilarities in more detail. It is well-known 
(see e.g. Gordon (1987) and Semple and Steel (2003)) that treelike and equidistant 



2-dissimilarities can be characterized in terms the 4-poi'nt and ultrametric condition, 
respectively (for more details see Section [6]). Thus, for k >3, we can ask for similar "m- 
point" conditions that characterize treelike/equidistant /c-dissimilarities. This question 
has been studied in Rubei (2011 ) for the case k = 3, where a recursive characterization is 
provided, and related problems are considered in Bocci and Cools ( |2009 ) in the context 
of tropical geometry. In addition, a necessary (but not sufficient) (/c + 2)-point condition 
is given for the general case in dPachter and Speyerf [20041 p. 618). 

In the last section, we provide explicit 6-point characterizations for 3-dissimilarities 
that are treelike/equidistant, which can be regarded as generalizations of the 4-point/ul- 
trametric conditions (Theorem [T]). We conclude with a short discussion as to why finding 
similar conditions for A; > 4 appears to be somewhat more challenging. 



2 Preliminaries on phylogenetic trees 

For the remainder of this paper, X will always denote a non-empty, finite set. Also, for 
a /c-dissimilarity D : — )• M and {xi, X2, . . . , x^} G (^) , we will write D{xi,X2, ■ ■ ■ ,Xk) 
instead of D{{xi,X2, ■ ■ ■ , Xk})- 

We now recall some further definitions concerning phylogenetic trees (for more details 
see Semple and Steel (2003)). Let T = iV,E) be a phylogenetic tree on X. A vertex 
V & V \ X is called an interior vertex of T. The set of leaves of T, that is, the set X, 
is also denoted by L{T). Recall that it is assumed that all interior vertices have degree 
at least three. An edge e E E is called pendant if it is incident to a leaf of T. All other 
edges are called interior edges. 

Now, two phylogenetic trees Ti = (yi,Ei) and T2 = (V2,-E'2) on the same set X are 
isomorphic if there exists a bijective map t : Vi ^ V2 such that l{x) = x holds for all 
X G X and {u,v} G Ei if and only if {i{u), l{v)} G E2 for any two distinct u,v E Vi. In 
case we also have edge- weightings coi : Ei — )■ M, i G {1,2}, the weighted phylogenetic trees 
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Figure 2: The weighted phylogenetic tree {T{M),u{M)) arising from the weighted, 
rooted phylogenetic tree in Figure flic) for M = 2. 



(Ti,ui) and {T2,uj2) are isomorphic if, in addition, ijJi{{u,v}) = ijJ2{{l{u) , l{v)}) holds 
for every edge {u,v} e Ei. Note that interior edges with weight can give rise to non- 
isomorphic weighted phylogenetic trees that induce the same fc-dissimilarity. Therefore, 
in the following we will always implicitly assume that in any weighted phylogenetic 
tree interior edges are assigned positive weights. We call such edge-weightings interior- 
positive, for short. 

We also apply the above terminology to (weighted) rooted phylogenetic trees with 
the following minor adaptations. For two rooted phylogenetic trees Ti and T2 with roots 
Pi and P2 to be isomorphic we require, in addition, that i(pi) = P2 holds. Note that in a 
weighted, rooted phylogenetic tree (T, p, u) with u equidistant, every interior edge has a 
non-negative weight while pendant edges might have negative weights (cf. Figure [l](c)). 
Again, to avoid non-isomorphic weighted, rooted phylogenetic trees giving rise to the 
same fc-dissimilarity, we always assume that the edge-weightings are interior-positive. 
A rooted phylogenetic tree T = (V, E) on X with root p is binary if every vertex in 
V \{X U {p}) has degree precisely three and p has degree two. 

Next note that for every weighted, rooted phylogenetic tree (T, p, u) with, not nec- 
essarily non-negative, equidistant edge-weighting u there exists a constant M > such 
that the edge- weighting um, that assigns weight oj{e) to every interior edge e and weight 
(X'(e) + M to every pendant edge e, is also equidistant and non-negative. Thus, given a 
weighted, rooted phylogenetic tree (T,p,co) on X with u equidistant, we can construct, 
for any sufficiently large constant M > 0, a weighted phylogenetic tree {T{M),u{M)) on 
X with u}{M) non-negative and interior-positive as follows: If p has degree at least three, 
then put T{M) = T and u{M) = um- Otherwise, delete p and connect the two vertices 
u and V adjacent to p by a new edge with weight um{{p, u}) +i^m({p, v}) (cf. Figure [2]). 
Note that if M is known, we can completely recover {T,p,u) from {T{M),u{M)). 

For every rooted phylogenetic tree T = (V, E) on X with root p there is a natural 
partial ordering <t on V with unique minimal element p defined by v <t w if and only 
if f is a vertex of the unique path from w to p in T. The rooted subtree of T induced 
hy V &V has vertex set {u &V : v <t u} and root v. In addition, for any equidistant 
edge- weighting co of T, we define the height h(^T,u}){v) of v, also referred to as the height 
of Ty, as the value d(T,ui){v,x) for any leaf x of T^. Note that this height is well-defined 
in view of the fact that u is equidistant. 

Finally, for rooted, as well as unrooted, phylogenetic trees T on X we denote, for 
any subset F C X, the smallest subtree of T containing the vertices in F by T|y and 
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refer to it as the restriction of T to Y. (To formally view T|y as a phylogenetic tree on 
Y, we suppress any vertices of degree 2.) In case T is rooted, we also consider T|y as 
a rooted phylogenetic tree where we distinguish the minimal element in the vertex set 
of T|y with respect to the partial order <t as the root of the restriction. And, in case 
T has an edge-weighting co, we consider T|y as a weighted tree with edge-weighting u\y 
obtained by restricting u to the edge set of T|y. 

3 Determining trees 

We begin this section by stating a uniqueness theorem that will be useful later: 

Theorem 3. For every integer k > 2 and every set X with at least 2k — 1 elements we 
have: 



(i) Two weighted phylogenetic trees {Ti,ui) and {T2,uj2) on X with Ui non-negative 



and interior-positive, i G {1,2}, are isomorphic if and only if D^j, ^ ^ = 



holds. 

(a) Two weighted, rooted phylogenetic trees {Ti,pi,ui) and {T2, P2i^2) on X with Ui 



equidistant and interior-positive, i G {1, 2}, are isomorphic if and only if D 



Dfrp . holds. 



k 



Note that for k = 2 parts (i) and (ii) of this theorem are well-known (see e.g. Semple 
and Steel (2003, Theorem 7.1.8)). Moreover, part (i) is just a restatement of Theorem [l 



above due to Pachter and Speyer, and part (ii) immediately follows from part (i) by 
considering the weighted phylogenetic trees (Ti(M), co'i(M)) and (T2{M) , U2{M)) for 
some sufficiently large constant M > 0. 

We now prove the first part of Theorem |2] 

Theorem 4. Let k > 2 and D be a k- dissimilarity map on a set X , \X\ > 2k. 

(i) D is treelike if and only if the restriction of D to every 2k-element subset of X is 
treelike. 

(ii) D is equidistant if and only if the restriction of D to every 2k-element subset of 
X is equidistant. 



Proof, ii) For k = 2 this well-known (see e.g. Semple and Steel (2003)). So we shall 
assume in the following that A; > 3 holds. Clearly, if D is treelike, then also the restriction 
to every 2fc-element subset of X is treelike. 

Conversely, assume that the restriction of D to every 2fc-element subset of X is 
treelike. Note that this implies that the restriction of D to every z-element subset Y of 
X , k < i < 2k, is treelike, that is, there exists a weighted phylogenetic tree (Ty,ujy) on 
Y with uy non-negative and interior-positive such that D\y = D^j,^ holds. 

Now consider an arbitrary pair of elements {a, b} G (^) . We claim that in any 
weighted phylogenetic tree {Tz,ujz), Z G (2^^1)5 {O'^b} C Z, the induced distance 
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d(^Tz,ioz)i^^^) the same. To show this, it suffices to consider such sets Z,Z' with 
Z' = {Z \ {x}) U {y} for two distinct elements x,y G X \ {a,b}. We now consider 
the weighted phylogenetic tree (Ty,a;y) for the 2A;-element set Y := Z U {y}. Since 
\Z\ = \Z\' = 2k — 1, it follows by Theorem|3|^i) that {Tz, uz) is isomorphic to {Ty\z, ^y\z) 
and {Tzi.oJz') is isomorphic to {Ty\z' ,^y\z')- This implies that the induced distance 
between a and h is the same for {Tz.ojz) and {Tz'^ojz'), as claimed. 

As a consequence, for every pair {a, 6} G (^), the restriction of D to any {2k — 1)- 
element subset of X containing a and h yields the same distance between a and 6, which 
we denote by 5(a, h). Note that the restriction of the so-defined 2-dissimilarity 5 on X to 
every 4-element subset of X is treelike: For any four distinct elements a,b,c,d & X we 
can select an arbitrary Z G (2^'^!) with {a, b, c,d} ^ Z and in the weighted phylogenetic 
tree {Tz, oJz) the induced distances between a, 6, c, d will equal the corresponding values 
of 5. Hence, (since the theorem holds for k = 2) there exists a unique weighted phylo- 
genetic tree {T,u) on X with u non-negative and interior-positive such that D'^tuj) ~ ^ 
holds. Moreover, the restriction of (T, u) to any 2/c-element subset Z O X is isomorphic 
to {Tz,0Jz)- Hence, the /c-dissimilarity D^tlu) niust be D. 

(ii) Again, if a fc-dissimilarity on X is equidistant, so is its restriction to each 2k- 
subset of X. So, let D be a fc-dissimilarity on X such that its restriction to each 
Y G (^) is represented by a weighted, rooted phylogenetic tree {Ty,Py,ooy) on Y with 
uy equidistant and interior-positive. Then, for some sufficiently large M > 0, all the 
weighted phylogenetic trees (Ty(M), a;y(M)) are such that uy{M) is non- negative and 
interior-positive. Therefore, by the ffist part of the theorem, there exists a unique 
weighted phylogenetic tree (T, u) on X with u non-negative and interior-positive such 
that D^^^^^{A) = D{A) + kM holds for all A e (f). Moreover, {T,uj) must be iso- 
morphic to {T'{M),u'{M)) for some weighted, rooted phylogenetic tree (T', p', u') on X 
with uj' equidistant and interior-positive, since otherwise there would exist some Y G (^) 
such that Uy is not equidistant in view of the fact that, for all Y G (^), (T|y,a;|y) is 
isomorphic to {Ty{M),ujy{M)). Hence D must equal D'^j,, as required. □ 

Using this theorem we now show that, for fixed > 3, it is possible to efficiently 
check when a /c-dissimilarity is treelike/equidistant. Note that any algorithm to check 
whether a given /c-dissimilarity D is treelike/equidistant needs to read D ffist. Assuming 
that D is given as the list of values it takes on for each fc-element subset of X, this yields 
a lower bound of \X\^ on the run-time of any such algorithm. 

Corollary 1. For any fixed k > 3 and any k- dissimilarity D on X , there is an algorithm 
with run-time in 0{f{k) ■ jXp'^) to decide whether D is treelike/equidistant or not, where 
f is a function that does not depend on \X\. 

Proof. Given a ^-dissimilarity D on X, it suffices to check for every Z G (^) whether 
D\z is treelike/equidistant. To do this, one can enumerate all (isomorphism classes 
of) unweighted phylogenetic trees (rooted or unrooted) with 2k leaves labeled by the 
elements in Z. Note that the number of these trees depends on k but not on 
For each of those trees T, it remains to check if there exists an edge-weighting u with 
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(a) p (b) p 




Xi X2 X4 X5 Xg Xs Xi X2 X3 X4 X5 Xq X7 



Figure 3: (a) A weighted, rooted phylogenetic tree {T,p,ijj) with an equidistant and 
interior-positive edge- weighting u. (b) The same rooted phylogenetic tree but with the 
equidistant edge- weighting co(k,a) for = 5 and a = 10. Note that D^rrui) = D\tui . )• 



certain properties so that D^tui) ~ holds. The latter can be phrased as a test 
whether a system of linear equations and inequalities has a solution, a problem for 
which a polynomial time algorithm is known (see e.g. Schrijver (1986)). Therefore, one 
can check in 0{f{k)) time whether D\z is treelike/equidistant where / is a function that 
does not depend on Since the number of 2A;-element subsets of X is in C'dXp'^), 
this establishes the required run-time bound. □ 



4 Sharpness of the bounds 

In this section, we shall prove the second part of Theorem [2| that is, we shall prove 
that the bounds presented in the theorem are indeed sharp. More specifically, for each 
k > 3, we will present an example of a /c-dissimilarity D whose restriction to every 
( 2 /c — 1) -element subset is treelike/equidistant while D is not treelike/equidistant. These 
examples will be presented in Examples [T] and [2] below. 

We begin by presenting a useful lemma. Assume we have k > 3, \X\ > k and that 
(T = {V, E), p,u) is a weighted, rooted phylogenetic tree on X with u equidistant and 
interior-positive. In addition, assume that p is adjacent to precisely two vertices u and 
V (cf. Figure |3](a)). Put a = uj{{p,u}) and b = uj{{p,v}). Now define, for each a G M 
with a < 2min{a, 6}, a new equidistant edge-weighting 1^(^,0,) for T (cf. Figure [3](b)) by 
putting, for all e G E, 

{u{e) + a/k, if e is incident to a leaf, 
u{e) — a/2, if e is incident to p, 
a;(e), else. 



Lemma 1. Suppose k > 3, \X\ > k and that (T = {V, E), p,uj) is a weighted, rooted 
phylogenetic tree on X with u equidistant and interior-positive such that p is adjacent 
with precisely two vertices u and v and put a = u{{p,u}) and b = u){{p,v}). Then for 
each a G M with a < 2min{a, b} and A G (^) we have 

j^k ^ f^(T,.)(^) + ^/^ ^ L{Tu) or AC L(T,), 
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In particular, {T,p,u) and {T, p,U(^k,a)) induce the same k- dissimilarity if and only if we 
have a = or |L(T„)| < k — 1 and \L{T^)\ < k — 1 hold. 

Proof. Let A C The restriction T\a contains k pendant edges. Moreover, T\a 

contains the two edges incident in T with p if and only if we have A fl L{Tu) 7^ and 
A n L(T^) ^ 0. Hence we have Df^^^^^ ^^^(A) = D\^^^^{A) + fcf in case A C L(T„) or 

A C L(T^) holds and we have Df^^^^^ J{A) = Df^_^)(A) + A;f -2f = /^fr,^)(A) otherwise, 
as claimed. The second assertion trivially holds if a = 0. If a 7^ 0, then it holds since 
all A G (^) contain leaves from both L{Tu) and L{Ty) if and only if \L{Tu)\ < k — 1 and 
|L(r^)| < A; - 1 hold. □ 

Example 1 (Equidistant). Let A; > 3 and {T,p,u) be a weighted, rooted phylogenetic 
tree on X with u equidistant and interior-positive such that the root p of T is adjacent 
with precisely two vertices u and v. Put a = ijj{{p,u}) and b = u{{p,v}). Assume 
that \L{Tu) \ , |I/(T^)| > k. Now, choose some non-zero a < 2min{a, 6} and define a Ac- 
dissimilarity D on X via 



D{A) 



D(T.u.M), else. 



for all A G (^). We first show that the restriction of D to any (2A; — l)-element subset 
of X is equidistant: For any Y G {2^-1) define a weighted, rooted phylogenetic tree 
{Ty , Py , ^j-'y) on Y with uy equidistant and interior-positive by setting 



{Ty,Py,ojy) 



{T,p,U(k,a))\Y, if \YnL{Tu)\ > k, 
{T,p,u)\y, else. 



By the definition of D it follows that D\y = D^Ty l^y) ^^^'^s- 

We now show that D is not equidistant. It suffices to show that there exists some 
subset Z of X such that is not equidistant. So, let Z C X be such that \Z fl L{Tu) \ = 
\Z n L(Ty)\ = k (such a subset exists since \L(Tu)\ ,\L(Ty)\ > k), and suppose there 
exists some weighted, rooted phylogenetic tree {Tz, pz.^z) on Z with uz equidistant 
and interior-positive such that D\z = D^Tzi^z) '^o^'^s- -^^^ every a G 2' fl L(Tu) we 
define the weighted, rooted phylogenetic tree (Ta,Pa,uJa) '■= (Tz, pzi^z)\z\{a} on Z \ 
{a}. Choose distinct x,?/ G Z fl L{Ty) which are not adjacent to a common vertex of 
degree 3 (this is possible since \Zr\L{Tu)\ > k > 2). By Theorem |3|^ii), {Tx,px,ujx) is 
isomorphic to {Tz\{x}, Pz\{x},^^z\{x}) and {T, p,u)\z\{x}, and {Ty,py,Uy) is isomorphic to 
(Tz\{y},Pz\{y},^^z\{y}) and (T, p,u)\z\{y}. 

By our choice of x and y, up to isomorphism, there exists only one possible weighted, 
rooted phylogenetic tree on Z whose restriction to Z \ {x} and Z \ {y} is Tx and Ty, 
respectively. Hence (Tz, pz,^j~'z) is isomorphic to (T,p,u;)\z- However, this contradicts 
the fact that {Tz, pz,^z) induces D\z since D{Z fl L{Tu)) = Dfj.^ AZ n L(Tu)) 7^ 

^{Tuj)i^ n L{Tu)) = D^rp^ ^^-^{Z n L{Tu)), where the first equality is by definition and 
the second follows from the fact that {Tz, pzi^z) and (T,p, c<;)|z are isomorphic. 
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Example 2 (Treelike). An example for the case k = 3 was given by Chepoi and Fichet 



(2007). Here we give an example for general k. Based on the weighted, rooted phylo- 
genetic tree {T,p,uj) in Example [l| consider {T{M),uj{M)) for some sufficiently large 
constant M > and choose a > 0. Then, using the same arguments as in Example [T| 
it is straight-forward to show that the /c-dissimilarity D constructed in the same way as 
in Example [T] is not treelike while its restriction to every (2k — l)-element subset of X 
is treelike. 



5 The case 2k — 1 for equidistant A:-dissimilarities 

It is well-known that if the restriction of a 2-dissimilarity D on X to every subset of X of 



size 3 is equidistant, then D is equidistant (Semple and Steel, 2003, Theorem 7.2.5). In 



contrast, in Example [T| we have seen that for k > 3 a /c-dissimilarity D is not necessarily 
equidistant if its restriction to every {2k — l)-element subset is equidistant. However, 
we shall now prove that we can still recover a tree if the restriction of such a D to all 
{2k — l)-element subsets Y (1 X is induced by a weighted, rooted phylogenetic tree 
(T, p, u) on Y with u equidistant and interior-positive such that (T, p, u) is generic, that 
is, T is binary and no two distinct interior vertices have the same height. 

Theorem 5. Let k > 3 and D be a k- dissimilarity map on X such that, for all Y G 
(2^^!)' there exists a generic weighted, rooted phylogenetic tree {Ty,Py,ooy) on Y with 
coy equidistant and interior-positive such that D\y = D(^Ty,uiy) holds. Then there exists 
a binary rooted phylogenetic tree T on X such that, for all Y G {2^^i) > ^he unweighted, 
rooted phylogenetic trees Ty and T|y are isomorphic. 

To prove this theorem, we shall use a well-known result about collections of rooted 
phylogenetic trees each having three leaves, that will allow us to "merge" trees, which 
we now recall. A triplet on X is a pair ({a, 6},c) with a, 6, c G X distinct, which we 
denote also by ab\c. The set of all triplets on X is denoted by 1Z{X), and a subset IZ 
of 'R-{X) is called a triplet system on X. Given a rooted phylogenetic tree T on X, the 
triplet system TZt of T is the set of all triplets ab\c on X such that the path from a to 
6 in T is vertex-disjoint from the path from c to the root p in T. It is easily seen that 
for a rooted phylogenetic tree T on X and a rooted phylogenetic tree T' on F C X, we 
have TZt' ^ TZt if T' is isomorphic to T|y. We now state the aforementioned result: 



Theorem 6 (Theorem 9.2 (ii) in Dress, Huber, Koolen, Moulton, and Spillner (2011)). 



A rooted phylogenetic tree T on X is, up to isomorphism, uniquely determined by the 
triplet system TZt. Moreover, given a triplet system TZ C TZ{X) there exists a rooted 
phylogenetic tree on X with TZt = TZ if and only if TZ satisfies the following two condi- 
tions: 

(TZl) For any three elements a,b,c & X at most one of the triplets ab\c, bc\a and ca\b 
is contained in TZ. 

(7Z2) For any four elements a, b,c,d E X , ab\c G 7Z implies ad\c eTZ or ab\d G 7Z. 
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To prove Theorem [5] we will also use the following rather technical result: 

Lemma 2. Let k > 3 be an integer, X a set with \X\ = 2k — 2 and {Ti,pi,uJi) and 
{T2iP2i^2) be two generic weighted, rooted phylogenetic trees on X with Ui equidistant 
and interior-positive, i G {1,2}. If D^j.^^^-^ = -D^p^ then Ti and T2 are isomorphic as 
unweighted, rooted phylogenetic trees. 

Proof. We distinguish two cases. First consider the case that at least one of Ti and 
T2, say Ti, contains a vertex vi such that the set B of leaves of the rooted subtree 
(Ti)„^ has cardinality k — 1. Define A := X \ B . We claim that T2 must contain a 
vertex V2 such that the set of leaves of (T2)„2 is B. To establish this, first note that 
^(Ti,c.i)(^ U {b}) = D'^^^^^^^iA U {b'}) and, therefore, in view of Df^^^^^^ = D\^^^^^y also 
^(T2,a;2)(^ U {b}) = D^j,^^^^^{A U {b'}) must hold for all b,b' G B. This implies that, for 
every b E B, the height of the vertex Wb where the path from b to the root of T2 first 
meets the subtree T2IA must be the same. Hence, since T2 is generic, Wb = holds for 
all 6, b' G B. But then the tree T2\b must equal the tree {T2)v2 for some vertex V2 of T2, 
as claimed. 

Next, we claim that Ti\a and T2\a as well as Ti\b and T2\b are isomorphic as un- 
weighted, rooted phylogenetic trees. By Theorem |6] it suffices to show that 7^(Ti|^) = 
1Z{T2\a) and 1Z{Ti\b) = 'R-(T2\b) holds. In the following we will focus on the set A. A 
completely analogous argument yields 7^(Ti|b) = 71(T2\b)- So, consider three arbitrary 
distinct elements a, b and c in A and an arbitrary {k — 2)-element subset C of B. Up 
to relabeling, Figure |4] depicts the possible cases for the structure of the tree Ti\cu{a,b,c}- 
Note that, by the assumption that Ti is generic, cases (b), (d), (g), (i), (j) and (k) are 
ruled out. In the remaining cases we have: 

(a) Df^^^^^)(C'U{a,6}) = Z}JV^^^^)(CU{a,c})<Df^^^^^)(CU{6,c}) 

(c) (and, similarly, (e) and (f)) 

U {b,c}) < /^f^,,^,)(C U {a,b}) = D\^^^^^^{C U {a,c}) 

So, we have ab\c G 1Z{Ti\a) if and only if either 

(1) Df^^^^^)(CU {a,c}) = Z}f^^_^^)(CU {6,c}) ^ Df^^^^^)(CU {a,6}) holds, or 

(2) D^^j,^^^^{C'Ci{a,b}), D^^j,^^_^^{CU{a,c}) and D^^.^ ^^)(CU{6, c}) are pairwise distinct 
and D^j,^ i*^'^}) smallest value among them. 

But this implies, in view of the fact that D^rp^ = D^j,^ holds, that we have 7?.(Ti |^) = 
'^(^2U), as required. 

Now, to show that the trees Ti and T2 are isomorphic as unweighted, rooted phy- 
logenetic trees, it remains to show that Ti\b and T2\b are attached to Ti\a and T2\a, 
respectively, at the same position. To establish this, consider the set A* containing all 
a & A such that D^j,^ cji)(-^ W}) minimal. Note that there must exist vertices Wi in 
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(a) 



C a b c 






C a b c 



C a b c 



C a b c 







C a b c 



a b c C 



a b c C 



a b c C 






a b c C 



a b c C 



a b c C 



Figure 4: Schematic representations of the cases considered in the proof of Lemma [2] in 
the context of estabhshing that 71(Ti\a) = 'R-{T2\a) holds. 



Ti and W2 in T2 such that Ti\a* = {Ti)wi and T2\a* = {T2)w2 hold. Moreover, note that 
Vi and Wi as well as V2 and W2 must be adjacent to a common vertex, namely the vertex 
where Ti\b and T2\b are attached to Ti\a and T2\a, respectively. But this implies that 
Ti and T2 are isomorphic as unweighted, rooted phylogenetic trees. 

Next consider the case that neither Ti nor T2 contains a vertex such that the subtree 
induced by that vertex has precisely k — 1 leaves. Choose some M G M large enough and 
consider the weighted phylogenetic trees (Ti{M) , coi{M)) and (T2(M), (X'2(M)). Note 
that, for every edge e of Tj{M), j G {1, 2}, removing e from Tj{M) yields two subtrees, 
one of which has at least k leaves. But this is the crucial property used in the proof of 
Theorem [1] presented in Pachter and Speyer (2004), and the lower bound |X| > 2k — 1 



stated in this theorem is only needed to ensure that this property holds. Hence, even 
in case |X| = 2k — 2 the proof can be applied as long as we have this property. This 
implies that Ti(M) and T2(M) are isomorphic (even as weighted phylogenetic trees!) 
and, hence, also Ti and T2, as required. □ 



Proof of Theorem\^ Define TZ := IJye( ^ ) '^Ty We have to show that Conditions (7^1) 
and (7^2) hold for 7^. 

To show that (7^1) holds, it suffices to show that for all Z G (2fc^2)' ^^"^ distinct 
x^y G X \ Z, a,b,c G Z at most one of the triplets ab\c, bc\a and ca\b is contained in 
'^Tzu{x} ^'^Tzu{y}- Lemma |2| implies that the phylogenetic trees T* := T{zu{x}}\z and 
Tzu{y}\z are isomorphic. ByTheorem[6| Condition (7^1) holds for TZt*, as required. 

We now show that (7^.2) holds. Let a, b, c, d be distinct elements of X and suppose 
ab\c G TZ. Then there exists some Y G (2^1) such that ab\c G TZty- d G Y, then we 
have ad\c eTZ or ab\d eTZ since (7^2) holds for TZty- If d ^Y, take some x G Y\{a,b, c} 
and define Y' := {Y \ {x}) U {d}. Again, by Lemma |2| the phylogenetic trees Ty\y\{x} 
and Ty'\y\{x} are isomorphic, hence ab\c is also an element of TZry, and hence ad\c G TZ 
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or ab\d G TZ since (7^2) holds for TZty,- 



□ 



Remark 1. We suspect, but have not been able to prove, that, for > 3, if a fc-dissim- 
ilarity on X is induced by an arbitrary weighted, rooted phylogenetic tree {Ty,Py,<^y) 
with ujy equidistant and interior-positive for all Y E {2k-i)^ then it determines a rooted 
phylogenetic tree T on X such that, for all Y E {2k-i)^ ^^^^ -^'^^ isomorphic to T|y. 
Furthermore, depending on the topology of the unweighted phylogenetic tree T arising 
in this way, it might even still be possible to assign weights to the edges of T so that 
the edge- weighting is equidistant and the induced fc-dissimilarity is D. For example, if 
the number of leaves in one of the subtrees induced by the vertices adjacent to the root 
of T is smaller than k, then one can extend the arguments in the proof of Theorem |4] 
to show that one can indeed construct a suitable edge-weighting for T, and hence D is 
equidistant in this case. 



6 3-dissimilarities 



In this section, we prove that treelike and equidistant 3-dissimilarities can be character- 
ized by certain 6-point conditions. We begin by recalling some conditions for charac- 
terizing treelike and equidistant 2-dissimilarities (see e.g. Smolenskiil (1962); Zaretsky 



(1965); Buneman (1971); Gordon (1987); and Semple and Steel (2003 



It is well-known that a 2-dissimilarity D on X is treelike if and only if D is non- 
negative, it satisfies the triangle inequality (i.e., D{xi,X3) < D{xi,X2) + D{x2,X3) holds 
for any three distinct elements xi,X2,X3 E X) and 



D{x, x') + D{y, y') < max{L'(x, y) + D{x\ y'), D{x, y') + D{x\ y)} 



'V. 



holds for any four distinct x, x\ y, y' E X. Similarly, it is known that D is equidistant if 
and only if 

Dix,y)<max{D{x,z),D{z,y)} (2) 

holds for any three distinct x,y,z E X. 

Inequalities ([T]) and ^ are commonly called the 4-point and ultrametric conditions, 
respectively. Note that non-negativity of D and the triangle inequality follow from the 
4-point condition if one defines -D(a, a) = for all a G X and then drops the requirement 
that the elements are pairwise distinct. However, as we view a ^-dissimilarity as being 
a map from (^) into M, we need to explicitly require these additional properties. 

We now present similar conditions that characterize treelike/equidistant 3-dissimilar- 
ities that are obtained by associating with every 3-dissimilarity a suitable 2-dissimilarity. 
The construction of this 2-dissimilarity is similar to the approach followed in the context 



of so-called perimeter models considered, for example, in Heiser and Bennani (1997) and 
[Chepoi and Fichet| ( [20071 ). 



Theorem 7. Let D he a di- dissimilarity on a set X with \X\ > 5. 
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(i) D is treelike if and only if for all {a, 6, c, d, e} G (^) 

D{a, c, d) + D{a, c, e) + D{a, d, e) + D{b, c, d) + D{b, c, e) + D{b, d, e) 
< 2 {D{a, b, c) + D{a, b, d) + D{a, b, e) + D{c, d, e)) , 



(3) 



2 {D{a, c, d) + D{a, c, e) + D{b, d, e)) < D{a, b, c) + D{a, b, d) + L'(a, 6, e) 

+L'(a, rf, e) + D{b, c, c?) + D{b, c, e) + /^(c, d, e), 

L'(a, 6, c?) + D{a, b, e) + D{c, d, e) 



(4) 



D(a, c, + -D(a, c, e) + e) < max 



D{a, d, e) + D{b, c, rf) + D{b, c, e) 



an 



(i for all {a, 6, c, d, e, e'} G ('^) 

2D{a, b, e) - /^(a, c, e) - D(a, c?, e) - c, e) - D{b, d, e) + 2D{c, d, e) = 
2D(a, b, e') - D{a, c, e') - ^(a, c/, e') - D{b, c, e') - D{b, d, e') + 2D(c, d, e'). 



(a) D is equidistant if and only if for all {a,b,c,d,e} G (^) 
id /or all {a, 6, c, e, e'} G ('^) Equation ^ holds. 



(5) 



(6) 



(7) 



anc 



Proof. For any F = {a, b,c,d,e} G (5) we define a map 5y : (2) 
define the vector 



as follows. First 



vy '■= {Dia, b, c), D(a, b, d), D{a, b, e), D(a, c,d),..., D(c, d, e)Y 
as well as the following matrix and its inverse (note that A has full rank): 



/I 10010000 o\ 
1010010000' 
1001001000 
0110000100 
0101000010 
0011000001 
0000110100 
0000101010 
0000011001 

Vo 00000011 ly 



-1 



4-1 _ 1 



-1 2 

-1 -1 

2 -1 

-1 2 

-1 -1 

-1 -1 

-1 2 

V2 -1 



-1 -1 
-1 2 



-1 -1 
2 -1 



2 -1 -1 -1 -1 
-12 2-1 -1 
-12-12-12 -1 

2-12 2 2 
-1 -1-12 2 
-1 -12-12 

2 2-1-1 -1 

2 2-1-12 
-1 -1 2 -1 -1 
-1 -1 -1 2 -1 



-1 2 

2 2 

-1 -1 

2 -1 

-1 2 



2\ 

-1 

-1 
-1 
-1 
-1 
-1 

2 

2 

2/ 



Then, using the notation 

uy = (5y(a, &), 5y(a, c), (5y (a, c?), (5y(a, e), (5y(6, c), . . . , 5y (c?, e))^, 
the map 6y is defined by the unique solution of the system of linear equations 

2vy = A ■ Uy- 
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In particular we have: 



35y (a, b) = 2D{a, b, c) + 2D{a, b, d) + 2D{a, b, e) — D{a, c, d) — D{a, c, e) — -D(a, e) 

- D{b, c, d) - D{b, c, e) - d, e) + 2D(c, rf, e), 

35y (a, c) = 2D{a, b, c) — D(a, 6, d) — D{a, b, e) + 2D{a, c, rf) + 2D{a, c, e) — -D(a, d, e) 

- c, d) - D{b, c, e) + 2D{b, d, e) - D{c, d, e), 

35y(a, d) = -D{a, b, c) + 2D{a, b, d) - D{a, 6, e) + 2L'(a, c, d) - D{a, c, e) + 2D{a, d, e) 

- D{b, c, d) + 2D{b, c, e) - rf, e) - D(c, rf, e), (9) 
3(5y (6, c) = 2D{a, b, c) — i5(a, 6, d) — -D(a, 6, e) — D{a, c, d) — D{a, c, e) + 2D{a, d, e) 

+ 2D{b, c, d) + 2D{b, c, e) - D{b, d, e) - D{c, d, e), 
3(5y(6, c?) = —D{a, b, c) + 2D{a, b, d) — D{a, b, e) — D{a, c, d) + 2D{a, c, e) — D{a, d, e) 

+ 2D{b, c, d) - L>(6, c, e) + 2L)(6, ti, e) - D(c, rf, e), 
3(5y (c, d) = —D{a, b, c) — -D(a, 6, rf) + 2D{a, b, e) + 2D{a, c, rf) — /^(a, c, e) — /^(a, (i, e) 

+ 2D(6, c, d) - D{b, c, e) - D{b, d, e) + 2L)(c, d, e). 

It is not hard to see that D|y is treelike/equidistant if and only if 5y is treelike/equidistant. 
This is the key observation that will allow us to translate the 4-point /ultrametric con- 
dition characterizing when a 2-dissimilarity is treelike/equidistant into conditions for 
when a 3-dissimilarity is treelike/equidistant. 

Before we do this, we need some condition that ensures that, for any two distinct 
a, 6 G X and any two distinct Z,Z' G (^) with {a,b} C Z f] Z', we have 6z{a,b) = 
6z'{a,b). Clearly, it suffices to consider such sets Z,Z' G (^) with \Z f] Z'\ = A. In 
particular, for Z = {a, b, c, d, e] and Z' = {Z \ {e}) U {e'} we obtain Equation ^ which 
ensures that the map 5 : (^) — )• M defined by putting (5(a, b) := dzia, b) for an arbitrary 
Z G (^) with {a, 6} C Z is well-defined. And in this case 5z is treehke/equidistant for 
all Z G (^) if and only if 5 is treelike/equidistant if and only if D is treehke/equidistant. 

We now prove the two assertions of the theorem: (i) Using the equations in ([9]), 
it is not hard to check that the conditions for 5 (to be non-negative, to satisfy the 
triangle inequality and the 4-point condition) translate into Inequalities ([3]), (4) and 
([5]), respectively, (ii) Again it is not hard to check that, using the equations in (9), the 
ultrametric condition on 5 translates into Inequality ([T]). □ 



Note that Bocci and Cools (2009, Theorem 3.2) showed that there exists a family 
of maps 0fc from the set of all treelike 2-dissimilarities to the set of all treelike fc-dis- 
similarities that maps a 2-dissimilarity D^tu) induced by a weighted phylogenetic tree 
(T,uj) on X with u non- negative to the treelike fc-dissimilarity (j)k{D'^T ui)) ~ ^(Tuj)- ^'^^ 
k = 3 this map can be thought of as multiplication with the matrix A as considered in 
the proof of Theorem [7| but for > 4 it appears that no such simple representation is 
possible. 

Indeed, the key observation used in proving the above result is that the restriction 
of any treelike/equidistant 3-dissimilarity D to every 5-element subset Y O X can 
be related to a unique treelike/equidistant 2-dissimilarity on F by a system of hnear 
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equations (as this allowed the straight-forward translation of the 4-point /ultrametric 
condition into a 5-point condition). Unfortunately, it seems that there are problems 
when we try to apply this idea in case k > 4, even for k = 4. 

More specifically, first note that, although the restriction of any treelike 4-dissimi- 
larity D to every 6-element subset Y C X can be related to a treelike 2-dissimilarity 
on F by a system of linear equations, to do this one has to select a suitable ordering 
of the elements in Y (see also Bocci and Cools (2009, Theorem 2.2)). In contrast, in 
the case k = 3 any ordering works. Moreover, the system of linear equations does not 
need to have a unique solution, that is, even after fixing a suitable ordering, there can 
be more than one 2-dissimilarity on Y associated to the 4-dissimilarity D. This further 
complicates the translation of the 4-point condition into some form of 8-point condition 
for when D is treelike. 
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